Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FastCircle component #6290

Merged
merged 21 commits into from
Aug 30, 2024
Merged

Implement FastCircle component #6290

merged 21 commits into from
Aug 30, 2024

Conversation

EVAST9919
Copy link
Contributor

@EVAST9919 EVAST9919 commented May 19, 2024

Adresses #6067, ppy/osu#21495

Pros:

  • No nested Box drawable (adds up in demanding scenarios)
  • Correct masking when used inside CircularContainer
  • No overhead from masking info and no batch breaking in cases when multiple circles are being drawn consecutively (though I think SSBO supposed to fix that?)

Cons (which are holding me back from replacing current Circle implementation with this one):

  • Circle is no longer a container (so is the Box which I think is fine)
  • No ability to set masking-related effects directly (again Box has the same properties)
  • Points above might be considered a breaking change, but easily fixeable by using CircularContainer instead
  • Texture coordinates abuse to pass draw size into the shader without uniform buffers
  • Has a separate shader

Performance comparison in demanding scenarios:

master pr
Снимок экрана (524) Снимок экрана (525)
master-mania pr-mania
master-mania-2 pr-mania-2
master-slider pr-slider

@bdach
Copy link
Collaborator

bdach commented May 19, 2024

Just reading the title of this makes me want to go "no why is this being proposed even". I'd hope we can not resort to cpp style microoptimisations like this. But maybe it's just me 🤷

I'd be much more okay with this replacing Circle as long as there is an actual perf gain and the presumed increase in draw calls from shader rebinding doesn't kill the perf gain. The other tradeoffs mentioned sound fine to me.

@EVAST9919
Copy link
Contributor Author

EVAST9919 commented May 19, 2024

I'd be much more okay with this replacing Circle as long as there is an actual perf gain and the presumed increase in draw calls from shader rebinding doesn't kill the perf gain. The other tradeoffs mentioned sound fine to me.

My current proposals:

  • We probably can do some hackery like providing some specific flag somewhere and use the same texture shader with reroute to the new one and thus shader switch will be eliminated, but I can't think of any sane and good-looking solutions. I have a working implementation which relies on unused y component of BlendRange being set to a unique value, however it's probably too ugly.
  • We can leave this as a side component (probably with a better name and xmldoc) and use it only in specific cases, which is what I'm doing in provided screenshots
  • Wait for SSBO implementation and see performance difference then

Comment on lines +16 to +23
highp vec2 pixelPos = v_TexRect.zw * 0.5 - abs(v_TexCoord - v_TexRect.zw * 0.5);
highp float radius = min(v_TexRect.z, v_TexRect.w) * 0.5;

highp float dst = max(pixelPos.x, pixelPos.y) > radius ? radius - min(pixelPos.x, pixelPos.y) : distance(pixelPos, vec2(radius));

highp float alpha = v_BlendRange.x == 0.0 ? float(dst < radius) : (clamp(radius - dst, 0.0, v_BlendRange.x) / v_BlendRange.x);

o_Colour = getRoundedColor(vec4(vec3(1.0), alpha), vec2(0.0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the use of v_TexRect here.

  • Have you tested textures at all? It looks like you're not actually sampling the texture.
  • What about cropped textures? (Texture.Crop())
  • Does this work for rotated ellipses (textured and non-textured)? I assume so but it should be checked.

Copy link
Contributor Author

@EVAST9919 EVAST9919 May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the thing: I'm abusing texture coordinates to pass screenspace size to the shader without uniforms (to compute antialiasing). This new circle isn't a sprite, so textures can't be set. (in my defence current circle isn't a sprite either)
Regarding rotation: I'll add test showing it's working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: I'm not trying replicate CircularContainer, I'm simplifying one particular usage of it, which is CircularContainer with a plane Box with no texture inside (Circle class) to avoid masking overhead in cases where multiple circes are being drawn consecutively (GPU side) and reducing total drawable count (CPU side)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the thing: I'm abusing texture coordinates to pass screenspace size to the shader without uniforms (to compute antialiasing).

As a side note I can apply hacks explored in this pr to the Triangles shader since it's being widely used in the game to eliminate uniform buffer. Will pr after checking whether performance gain is worth doing so.

@bdach
Copy link
Collaborator

bdach commented May 21, 2024

My current proposals:

If this "fast" circle can't be applied universally then I'm not super interested in this imo.

@smoogipoo
Copy link
Contributor

smoogipoo commented May 21, 2024

My current position on this is I'm not sure where it fits in right now but I'm not totally opposed to it. Or at least I'm not totally opposed to the idea of having fast untextured primitives.

@EVAST9919
Copy link
Contributor Author

I'd be much more okay with this replacing Circle as long as there is an actual perf gain and the presumed increase in draw calls from shader rebinding doesn't kill the perf gain. The other tradeoffs mentioned sound fine to me.
If this "fast" circle can't be applied universally then I'm not super interested in this imo.

So I've added the "worst case" performance test scene, in which drawables are alternating between circle and the box since I haven't noticed much difference in game when replacing circle with my implementation.
In this "worst case" scenario draw call count is the same and for my machine specifically my version runs faster. If you are fine with that then I can update this pr to replace original circle instead.

Circle FastCircle
Circle Fast

@smoogipoo
Copy link
Contributor

smoogipoo commented May 22, 2024

The part that stops me from replacing Circle is the lack of a texture. It's the reason I questioned it in my review.

Slight tidbit: different systems handle draw calls very differently and it's very likely yours handles much better than, say, a mobile GPU.
That's to say that, yes in the current state it has the worst case number of draw calls as Circle, but if SSBO reduces it then the current implementation may start performing faster. But the SSBO gains are only theoretical right now.

@peppy
Copy link
Member

peppy commented May 22, 2024

a separate shader is going to interrupt batching, so we need to be sure that's what we want (probably not?)

@smoogipoo
Copy link
Contributor

smoogipoo commented May 22, 2024

What you have to understand is that it already interrupts batching as a result of masking. It all depends on whether the gains of SSBO can be realised.

@EVAST9919
Copy link
Contributor Author

EVAST9919 commented May 22, 2024

The part that stops me from replacing Circle is the lack of a texture. It's the reason I questioned it in my review.

Which current Circle doesn't have either, so we are not losing anything. (And there's no room left for texture quads without adding uniform buffers which is as bad for performance as masking)

a separate shader is going to interrupt batching

There's a way to use the same shader, but I'm not sure if it's too ugly or some better approach can be implemented.

I have a working implementation which relies on unused y component of BlendRange being set to a unique value, however it's probably too ugly.

@EVAST9919

This comment was marked as outdated.

@EVAST9919
Copy link
Contributor Author

Hmm, found discrepancy in behaviour

Okay, this one is easily fixable.
But also: while I can see how this component is too niche to include with the framework and potentially may cause performance issues upon misuse, I think it will be beneficial to at very least use it locally in osu! in the scenarios shown in OP, where we are sure that multiple circles will be drawn consecutively in a single draw call.
I have branches ready and can proceed further if necessary.

@smoogipoo
Copy link
Contributor

This PR's still in my emails fwiw, just very low priority while working on the many other tasks... I do eventually want to come back and re-read through the above comments + the capabilities offered by the existing Circle.

I may be okay blindly merging this in as FastCircle with documentation that it behaves slightly differently to Circle and is subject to change, for the specific scenarios that warrant it.

Copy link
Contributor

@smoogipoo smoogipoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge this one as it is for now to see where things go. Thanks!

@smoogipoo smoogipoo merged commit b281542 into ppy:master Aug 30, 2024
19 of 21 checks passed
@peppy
Copy link
Member

peppy commented Aug 30, 2024

Hmm, was kinda hoping this would just replace Circle. I really don't want to have to go through code and decide when we want to use FastCircle vs Circle. Would prefer to just fix the few broken cases of Circle usage with the new implementation if that's possible.

@smoogipoo
Copy link
Contributor

smoogipoo commented Aug 30, 2024

My point is I don't know if this is the best that can be done. It still breaks the batch after all, and in a way that is much harder to prevent. 99% of cases would be fine with just using Circle, there are very specific use-cases that would currently benefit majorly from this, and even those cases have to be structured in a special way.

@peppy
Copy link
Member

peppy commented Aug 30, 2024

Fair enough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants